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ABSTRACT 

We investigate sets of random variables that can be arranged sequentially such that a given variable 
only depends conditionally on its immediate predecessor. For such sets, we show that the full joint 
probability distribution may be expressed exclusively in terms of uni- and bivariate marginals. Under 
the assumption that the CMB power spectrum likelihood only exhibits correlations within a banded 
multipole range, Ai, we apply this expression to two outstanding problems in CMB likelihood analysis. 
First, we derive a statistically well-defined hybrid likelihood estimator, merging two independent (e.g., 
low- and high-^) likelihoods into a single expression that properly accounts for correlations between 
the two. Applying this expression to the WMAP likelihood, we verify that the effect of correlations 
on cosmological parameters in the transition region is negligible in terms of cosmological parameters 
for WMAP; the largest relative shift seen for any parameter is O.OGcr. However, because this may not 
hold for other experimental setups (e.g., for different instrumental noise properties or analysis masks), 
but must rather be verified on a case-by-case basis, we recommend our new hybridization scheme 
for future experiments for statistical self-consistency reasons. Second, we use the same expression to 
improve the convergence rate of the Blackwell-Rao likelihood estimator, reducing the required number 
of Monte Carlo samples by several orders of magnitude, and thereby extend it to high-£ applications. 
Subject headings: cosmic microwave background — cosmology: observations — methods: statistical 



1. INTRODUCTION 

The cosmic microwave background (CMB) radiation 
is one of the most pristine sources of information about 
the early Universe available to us. Since its discovery 
in 1964 (Penzias & Wilson 1965), the amount of infor- 
mation available to us about the CMB has increased at 
a rapid pace through series of ground-based, sub-orbital 
and satellite experiments. The recently released Planck 
temperature sky maps (Planck I 2013) is just the latest 
example of how the present challenge in the field of cos- 
mology is one of overabundance rather than shortage of 
data. 

To extract cosmological parameters from these ever 
growing data sets requires increasingly sophisticated and 
efficient algorithms, both due to larger data volumes 
and to more stringent requirements to statistical preci- 
sion. For example, the COBE-DMR sky maps published 
twenty years ago (Smoot et al. 1992) comprised 0(10"') 
pixels, and could be analyzed using exact brute-force 
likelihood techniques (e.g., Gorski 1994), with a com- 
putational scaling of ©(iVpj^). The WMAP sky maps 

published ten years ago comprised 0(10^) pixels (Ben- 
nett et al. 2003a), at which point faster and approximate 
methods had to be used for parameter estimation (Hivon 
et al. 2002; Verde et al. 2003). However, for WMAP 
the error budget was still dominated by cosmic variance 
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on large angular scales and instrumental noise on small 
angular scales, and confusion with Galactic and extra- 
Galactic emission was minimal, allowing for very simple 
component separation methods (Bennett et al. 2003b; 
Hinshaw et al. 2003). For Planck, the total number of 
data points in nine frequency bands is 0(3 • 10^), and 
instrumental noise never dominates the uncertainties at 
any angular scales, as small-scale astrophysical confusion 
becomes important at multipoles £ > 1500 (Planck XII 
2013). As a result, an unprecendented study of all im- 
portant sources of uncertainty, including instrumental, 
systematic and astrophysical, was required for Planck to 
reach its ambitious goals (Planck XV 2013). 

With the advent of these massive mega-pixel data 
sets, a number different analysis strategies have been 
developed to robustly extract cosmological parameters 
with acceptable computational cost. As of today, the 
preferred option for full-sky high-resolution experiments 
such as Planck and WMAP is to divide the analysis into 
two separate components according to large and small 
angular scales, and merge the two at the likelihood level. 
On large angular scales, they use a Gibbs sampling based 
(JeweU et al. 2004; Wandelt et al. 2004; Eriksen et al. 
2004, 2007) Blackwell-Rao estimator (Chu et al. 2005) 
that takes into account the full non-Gaussian structure 
of the true CMB likelihood, while on small angular scales, 
they use faster approaches (e.g., Hivon et al. 2002; Rocha 
et al. 2011; Planck XV 2013) coupled to an analytic 
multivariate Gaussian (and/or log-normal) likelihood ap- 
proximation. The computational cost of this hybrid ap- 
proach is dominated by spherical harmonics transforms, 
and therefore scales as 0{N ! ), which is acceptable even 
for large data sets. However, there is an unsolved prob- 
lem associated with this hybrid approach, and that is how 
to merge the two likelihood components into a single all- 
scale expression; correlations between the smallest scales 



in the large-scale likelihood and the largest scales in the 
small-scale likelihood should in principle be accounted 
for. As of today, no fully satisfactory solution to this 
exists in the CMB literature, though various approaches 
were explored during the Planck analysis. 

Having a computational scaling of 0{N ! ), the Gibbs 
sampling approach could in principle be employed for all 
angular scales, thus eliminating the need for any hybrid 
approximation. Unfortunately, in practice this method 
is in its current implementation limited to low angular 
scales for two reasons: First, joint CMB analysis and 
component separation is currently implemented in terms 
of pixel-based fits of physical foreground models, requir- 
ing all frequency bands to have the same angular resolu- 
tion, dictated by the coarsest resolution in a given data 
set. Second, although the computational scaling for the 
Gibbs sampler is acceptable, the prefactor is high. The 
2013 Planck likelihood employed 100 000 Gibbs samples 
in order to achieve robust Blackwell-Rao convergence, 
and each of those samples required ~2000 Conjugate 
Gradient iterations (and twice as many spherical har- 
monic transforms) to converge, for a total cost of 500 000 
CPU hours. Naively scaling this to full Planck resolution 
suggest a final cost of 0(10*) CPU hours. 

The main result of the present paper is a statistically 
well motivated block factorization of the CMB power 
spectrum likelihood that is applicable to several of these 
problems. Specifically, we show that for sets of random 
variables that can be arranged sequentially in such a 
way that all correlations have a finite range within the 
sequence, the full joint probability distribution may be 
written in terms of lower-dimensional marginals. The 
arch-typical example of such a distribution is a multi- 
variate Gaussian with a strictly banded covariance ma- 
trix, and we therefore call the general (non-Gaussian but 
conditionally limited) case also "banded" . With this sta- 
tistical identity ready at hand, we first suggest a sta- 
tistically well-motivated likelihood hybridization scheme 
that takes properly into account correlations between the 
low- and high-^ regimes, and, second, we show how the 
convergence rate of the Blackwell-Rao estimator can be 
improved by factorizing the full high-dimensional mul- 
tivariate posterior into a set of lower-dimensional dis- 
tributions, each of which converges much faster than 
the full distribution. This approach differs from the di- 
rect Gaussianization technique proposed by Rudjord et 
al. (2009) in that the underlying probabilistic structure 
(e.g., shapes of marginal and TV-point correlations) is 
conserved; in principle, the only modification to the full 
likelihood enforced by our new approach is that assumed 
negligible correlations are explicitly set to zero. 



2. FACTORIZING THE CMB LIKELIHOOD 

2.1. Factorization of handed probability distributions 

We begin with a general joint probability density 
Pi{^}) — P{&i,S2,03, . . . ,9n) for a set of random vari- 
ables, Ok, with k — 1,2,3, ... ,n. We choose one specific 
sequential ordering of these variables (out of all the pos- 
sible orderings), and use the definition of a conditional 
to write the joint distribution as a product of univariate 
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Fig. 1. — Angular power spectrum correlation matrix, M^^i , 
for the official Planck low-£ CMB data set, estimated by Monte 
Carlo sampling. Note that any two-point correlations are contained 
within a band of l\l ^ 15, suggesting that the CMB likelihood may 
be approximated as a banded probability distribution. To factorize 
the CMB likelihood into lower-dimensional elements, we partition 
the full multipole range into a set of disjoint blocks such that all 
non-zero covariance elements are embedded within a tri-diagonal 
block structure, indicated here by colored squares. 

conditionals, 

P{{e})^ P{e,,92,03,...,9n) 
= Pi9l\92,e3,...,0n) 

■ P{e2\e3,...en)--- 

■ P{9n^l\9n) ■ P{0n) 

We then assume that our variables only have a condi- 
tional probability dependence on their immediate neigh- 
bors in the sequence, i.e., that the probability distribu- 
tion is tri- diagonally banded, 
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Thus, this simple derivation shows that a strictly (tri- 
diagonally) banded probability distribution may be fac- 
torized recursively into a product of uni- and bivariate 
marginals. 

Before applying this expression to CMB likelihood ap- 
proximation, we note that even if the joint probability 
distribution do not have correlations exclusively between 
neighboring variables, it may still be possible to factorize 
it, provided at least some correlations may be ignored. 
For instance, suppose we can ignore all but the nearest 
two neighbors; in that case, the joint distribution will fac- 
torize into a product of uni-, bi- and trivariate marginals. 

2.2. Block factorization of the CMB likelihood 



In its most basic representation, a CMB data set, d, 
may be modelled as 

d = s + n, (2) 

where s is the true sky signal and n represents instrumen- 
tal noise. Both the signal and noise are usually assumed 
to be zero-mean Gaussian variables with covariances S 
and N, respectively. 

The noise covariancc matrix is typically given by ex- 
ternal knowledge about the instrumental noise character- 
istics and the scanning strategy of a given experiment. 
The signal covariance matrix, on the other hand, is gen- 
erally unknown, and must be estimated from the data. 
However, given the fact that we only have one observable 
sky available, it is impossible to estimate the N^^^ ele- 
ments in S from the A^pix elements in d without imposing 
strong priors on its structure. The most commonly ac- 
cepted prior is simply that the CMB sky is isotropic and 
homogeneous (e.g., Planck XXIII 2013). It is therefore 
convenient to expand s in spherical harmonics, such that 



s(n) =^aimYem{n) 



(3) 



£,m 



where n is a unit vector pointing to a given position on 
the sky, Yi^ are the spherical harmonics, and aim are the 
corresponding spherical harmonics coefficients. Then the 
signal covariance matrix may be written as 



^£m,£'m' 



{O-lmagi. 



Cg6wSr, 



(4) 



where Cg is known as the angular power spectrum. 

The main goal of most CMB experiments is precisely 
to measure the CMB power spectrum, and the most 
straightforward way to do so is by maximum-likelihood 
estimation. Since we have assumed that both signal and 
noise are Gaussian distributed, the CMB power spectrum 
likelihood simply reads 



C{Ci) = P{d\Ce) oc 



-id'(S(Cf)+N)-M 



VWC~i 



Nl 



(5) 



where S = S(C^) is the covariance matrix given in Equa- 
tion 4 expressed in pixel domain. Note that Cg denotes 
the set of all power spectrum coefficients, and the likeli- 
hood therefore spans an ^^ax-dimensional space. 

As already mentioned, brute-force evaluation of Equa- 
tion 5 scales computationally as ©(iVpj^), and is there- 
fore feasible only for very low angular resolutions. Much 
of the CMB analysis literature therefore revolves around 
finding computationally tractable approximations to this 
expression. 

In order to build up some intuition about the correla- 
tion structure oi C{Ce), it is useful to plot the correlation 
matrix 



Mf, 



{{C,~{Cf)){Cfj-{Ci.)) 



(6) 



Figure 1 shows this matrix for the official Planck low- 
i CMB data, as evaluated from 200,000 Monte Carlo 
samples generated with a CMB Gibbs sampler (Eriksen 
et al. 2007). In this case, there are significant correlations 
between all elements at ^ < 20, while at ^ > 50 any 
correlations are well contained inside a band of Ai? = 15; 
any correlations beyond /S.i > 30 are well below 1%. 



Higher-order correlations are significantly smaller than 
these two-point correlations. 

For typical sky cuts and instrumental noise character- 
istics, the basic CMB likelihood can therefore be ap- 
proximated as a banded probability distribution with 
a bandwidth of Ai < 15, and can therefore in prin- 
ciple be factorized by Equation 1. However, as cur- 
rently written this expression only applies to a strictly 
tri-diagonal covariance matrix. To circumvent this prob- 
lem, we therefore introduce an auxiliary block structure 
that embeds all non-negligible elements within a larger 
tri-diagonal structure, as illustrated by the colored blocks 
in Figure 1. That is, we define a set of multipole blocks 
such that 01 = {Q„i„, ...,Ci,}, 62 = {Ci,+i, . . . ,C£j, 
. . . , 9n = {C£„_j+i, . . . , Ci^^^}. Thus, each univariate 
marginal in Equation 1 is replaced with a multivariate 
distribution of dimension £i — (.i^i, and each bivariate 
marginal is replaced with a multivariate distribution of 
dimension £i — ii-2- This block- wise factorization consti- 
tutes the main result of this paper, and in the following 
sections wc will apply this to two concrete problems in 
CMB likelihood estimation. 

3. ACCURATE HYBRID CMB LIKELIHOOD ESTIMATION 

As already mentioned, both Planck and WMAP have 
adopted so-called "hybrid" likelihood approximations, 
combining a Gibbs sampling based Blackwell-Rao esti- 
mator at large angular scales with a Gaussian (and/or 
log-normal) pseduo cross-spectrum approximation at 
small angular scales. These two components are merged 
into a single expression at the log-likelihood level. The 
Planck likelihood simply adds the two log-likelihoods 
(Planck XV 2013), adopting a so-called "sharp transi- 
tion" between the low- and high-£ regimes, schematically 
illustrated in the left panel of Figure 2. This is the sim- 
plest possible approach, and assumes that any correla- 
tions across the transition multipole are negligible. The 
WMAP likelihood makes a different choice, by including 
the off-diagonal terms between the low- and high-£ blocks 
in the (Gaussian plus log-normal) high-i? likelihood, as il- 
lustrated in the middle panel of Figure 2. 

In this section, we introduce a new and statistically 
better motivated approach than either of two employed 
by Planck and WMAP, taking advantage of the block 
factorization derived in Equation 1. The first step in 
our approach is to partition the full multipole range be- 
tween ^niin and ^niax into three disjoint regions, L = 
{£min, • • • , Aow}, T = {flow -l- 1, . . . , 4igh " 1} and H = 
{^high, • • • : ^max}. Corresponding to a low-£ region, a tran- 
sition region and a high-i? region, respectively. The width 
of the transition region is chosen to be at least as wide 
as the effective bandwidth of the Ce covariance matrix 
(see Figure 1). With this partitioning, we now specialize 
Equation 1 to the case with n — 3 regions; 

\og£{Ci) ^\ogC{L,T) +\og£{T,H) -log£{T). (7) 

Note that this approximation is exact under the assump- 
tion of vanishing correlations between the low- and high- 
£ regions, which can be ensured simply by letting the 
transition region be sufficiently wide. This estimator is 
schematically illustrated in the right panel of Figure 2. 

Equation 7 has a simple intuitive interpretation: The 
log-likelihood is simply the sum of a low- and a high-£ 
contribution, defined such that they overlap over a suffi- 
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Fig. 2. — Schematic overview of the three hybridization schemes discussed in the text. The left panel illustrates a sharp transition 
between the low- (BlackwcU-Rao) and high-£ (MASTER) likelihood, as currently adopted by Planck. The middle panel illustrates the 
WMAP approach, which includes the off-diagonal elements between the low- and high-^ regions in the high-£ likelihood estimator. The 
right panel illustrates the new estimator proposed in this paper, in which correlations are accounted for through an transition region that 
is sufficiently wide to include all non-negligible correlations between the low- and high-^ regions. To avoid double-counting of the diagonal 
elements, the total log-likelihood is corrected by the log-likelihood including elements within the transition region only. 



TABLE f 

Summary of cosmological parameters derived with three different hybridization schemes; 

the original wmap approach including off-diagonal elements in the inverse covariance 

MATRIX {second column); A SHARP TRANSITION AT ^THANs = 32 (third column); AND THE NEW 

APPROACH IMPLEMENTING A TRANSITION REGION BETWEEN i = 21 AND 32 (fifth column). THE FOURTH 

AND SIXTH COLUMNS SHOW THE RELATIVE SHIFTS WITH RESPECT TO THE WMAP APPROACH MEASURED 

IN UNITS OF (7. 





Default WMAP 


Sharp transition 


Transition 


region 




Constraint 


Constraint 


Deviation (cj 


1 Constraint 


Deviation (a) 


nth^ 


0.0225 ± 0.0006 


0.0225 ±0.0006 


0.02 


0.0225 ±0.0006 


0.02 


n„,h^ 


0.111 ±0.005 


0.111 ±0.005 


0.01 


0.112 ±0.006 


0.05 
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1.039 ±0.003 


1.039 ±0.003 


0.04 


1.039 ±0.003 


0.05 


T 


0.088 ±0.015 


0.088 ±0.015 


0.04 


0.088 ±0.015 


0.05 


ns 


0.969 ±0.013 


0.969 ± 0.014 


0.03 


0.968 ±0.014 


0.06 


log[10iOA,] 


3.08 ±0.04 


3.08 ±0.03 


0.03 


3.08 ±0.04 


0.05 



Note. — The confidence intervals are 1 a, and the best-fit points are the marginalised means of the 
parameters. 



ciently wide multipole range that all non-negligible cor- 
relations are included. However, because the diagonal 
block in the transition region is included twice, both 
by the low- and the high-£ likelihood, one must sub- 
tract the corresponding marginal for the transition re- 
gion once to avoid double-counting (this is also an imme- 
diate consequence of equation 1, under the assumption 
that p{L\T,H) = piL\T) = p{L,T)/P{T), i.e. the low-£ 
region is conditionally independent of the high-^ region 
given the transition region) . Note that any estimator for 
the transition likelihood may be used for the correction 
term, typically by extracting the relevant range from ei- 
ther the low- or the high-£ likelihoods. 

To assess the importance of the specific strategy 
adopted for hybridization, we modify the (7-year) 
WMAP likelihood to include each of the three solutions, 
and derive constraints on the standard ACDM model us- 
ing WMAP data only. The transition multipole is set 
to ^trans = 32 for the sharp transition case, whereas the 
transition region is defined as ^ = {21,..., 32} for the 
new hybrid scheme. The WMAP Blackwell-Rao estima- 
tor is used both for the low-^ and the transition regions 



in the latter case. We adopt ilb/i^, Jlm/i^, 6*, r, n^., and 
\og{10^^ As) as our primary parameters, and adopt Cos- 
moMC (Lewis & Bridle 2002) as our MCMC engine. The 
resulting one-dimensional marginals are shown in Fig- 
ure 3 for all three cases, and posterior mean summary 
statistics are given in Table 1. 

With a largest relative difference between any two cases 
of 0.06ct, these results demonstrate that the standard six- 
parameter ACDM model is highly robust with respect 
to assumptions about the correlations across the transi- 
tion regime. Similar conclusions were found when per- 
forming an identical analysis for the the recently released 
Planck likelihood (Planck XV 2013), and this motivated 
the choice of a sharp transition for that particular im- 
plementation. For future experiments and analyses we 
nevertheless recommend the hybrid approach presented 
here, for two main reasons. First, our expression provides 
a statistically well motivated solution whose validity may 
be monitored directly through the C^ covariance matrix; 
without the same level of statistical rigour, detailed sim- 
ulations are more critical for the other two approaches, 
and these should in principle be repeated both when the 
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Fig. 3. — Comparison of best-fit parameters derived by CosmoMC from WMAP using likelihood approximations based on the new hybrid 
estimator presented in this paper (solid black line); the WMAP approach including off-diagonal elements in the inverse covariance matrix 
{dashed red line); and a sharp transition between the low- and high-^ regions (dotted blue line). 



data set or the parametric model is changed. Second, 
this expression is implementationally trivial once both 
low- and high-£ likelihoods are available, and there is 
therefore no practical reason for not including these cor- 
relations, even if their impact may be small. 

4. FASTER BLACKWELL-RAO CONVERGENCE 
4.1. Review of the Blackwell-Rao estimator 

As mentioned in Section 1, both the Planck and 
WMAP low-^ likehhoods (Planck XV 2013; Hinshaw et 
al. 2012) employs a specific Blackwell-Rao (BR) estima- 
tor to produce an accurate likelihood approximation that 
accounts for all correlations and non-Gaussian structures 
(Chu et al. 2005). The main advantages of this estimator 
are 1) computational speed, 2) implementational simplic- 
ity, and 3) support for seamless marginalization over sys- 
tematic effects and component separation errors through 
Gibbs sampling (Eriksen et al. 2007). 

This estimator may be explained intuitively as follows: 
Suppose it is possible to construct an experiment that 
provides a perfect full-sky noiseless image of the CMB 
sky, d = s. For that experiment, the only source of 
uncertainty on Ce is cosmic variance, and the exact CMB 
likelihood in Equation 5 reduces to an inverse Gamma 
distribution, 



^o{Ci 



-^s*S(Ce)-^s 



^/wm\ 
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c: 



(8) 



P to be the 



Here we have defined a^ = 21+1 Y^T=~m. I^-^" 
realization specific power spectrum of s. 

However, for any real experiment there are additional 
sources of uncertainty beyond cosmic variance, for in- 



stance from instrumental noise and foreground contami- 
nation, and P(s|d) is no longer a delta function. In or- 
der to account for this additional uncertainty, one must 
weight the ideal likelihood in Equation 8 with respect to 
P(s|d), 



CBKiCi)^ dsCoiC,)P{s\d). 



(9) 



At first glance, this integral appears difficult to evaluate, 
as it involves millions of degrees of freedom. However, 
this is precisely where the CMB Gibbs sampler enters the 
picture. As explained in detail by Jewell et al. (2004); 
Wandelt et al. (2004); Eriksen et al. (2004, 2007), the 
output from this algorithm is a set of samples drawn 
directly from P(s|d), accounting for both instrumental 
noise and foreground errors. Thus, the integral can be 
simply evaluated by Monte Carlo integration as a sum 
over these samples, 



•^br(C£) 
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n 



2^+1 Za. 
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(10) 



This is the CMB power spectrum Blackwell-Rao estima- 
tor, which is guaranteed to converge to the true likelihood 
in the limit of iVgamp -^ 00. 

4.2. Lifting the "curse of dimensionality" 
by block factorization 

While the Blackwell-Rao estimator is guaranteed to 
converge to the correct answer, it is not obvious how fast 
it does so, as measured in terms of number of samples 
required for convergence, A'samp. Further, since the com- 
putational cost of a single Gibbs sample is typically on 
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Fig. 4. — Illustration of the "curse of dimensionality". The 
Blackwell-Rao estimator builds up a smooth histogram from a fi- 
nite set of Monte Carlo samples by assigning a distribution (or 
kernel) to each sample. The number of samples required to reach 
convergence is proportional to the ratio between the volume of the 
kernel (blue) and the volume of the full distribution (black). If this 
ratio is r < 1 in one dimension (top and left panels), it is r^ in two 
dimensions (central panel), and r" in n dimensions. This implies 
that the number of Monte Carlo samples required to reach conver- 
gence for the CMB BR estimator scales exponentially with Imax- 
The evaluation of the 2-d likelihood at a specific point in param- 
eter space (red cross) will be much more sensitive to the number 
of samples than the corresponding evaluations in the respective 
marginalized parameter spaces (red lines). 

the order of several CPU hours (Eriksen et ah 2004), de- 
pending on the angular resolution and/or signal-to-noise 
ratio of the data set under consideration, it is important 
to understand this scaling before attempting a full-scale 
analysis. Indeed, Chu et al. (2005) showed that iVsamp 
scales exponentially with ^max, effectively limiting its op- 
erational range to ^max ~ 50-70. The main goal of the 
present section is to improve on this limit, and extend 
the BR estimator to high ^'s. 

To understand the origin of the exponential scaling, 
we show in Figure 4 a simple two-dimensional Gaussian 
distribution mapped by a Monte Carlo sampler. The 
top and left panels show the respective one-dimensional 
marginals. The Blackwell-Rao estimator establishes a 
smooth approximation to these distributions by assign- 
ing a kernel of finite width to each individual Monte 
Carlo sample (illustrated by blue contours/Gaussians) 
before taking the average over all samples. Suppose now 
that the width of the one-dimensional kernel is 10% of 
the width of the marginal distribution; in that case, one 
needs ^10 samples in order to cover the marginal once. 
In two dimensions, however, one needs ~10^ samples to 
cover the full joint distribution once, since the ratio now 
is only 10% in each of the two directions. More generally, 
in n dimensions one would need ~10" samples. This is 
a variation of the well-known "curse of dimensionality" , 
which says that the number of points required to cover 
an n-dimensional space scales exponentially with n. 

The BR estimator given in Equation 10 converges well 
up to £ ~ 30 with only a few thousand samples for 
WMAP (Chu et al. 2005), while for Planck it is found to 
be robust up to ^ « 70 with 100 000 samples (Planck XV 
2013). To extend to even higher ^'s by brute force would 



soon require a prohibitively large number of samples, as 
the computational cost for the Gibbs sampling step of 
the latter case is already half a million CPU hours. 

Fortunately, the block factorization presented in Sec- 
tion 2 may be used to define an alternative and compu- 
tationally much cheaper algorithm: 

1. Partition the full ^max-dimensional C{Ci) into a se- 
quence of lower-dimensional blocks, r^, for instance 
of width A£. 

2. Use the standard BR estimator to estimate the 
marginal likelihood for each block and each neigh- 
boring set of two blocks. 

3. Merge these block marginals into a single all-£ es- 
timator through the block factorization in Equa- 
tion 1. 

Thus, our new likelihood approximation can be written 
succinctly on the following form. 



C{Ct) 



uVi^ 



BR 



irk,rk+i) 



Ill=2 ^BR 



irk) 



(11) 



Note that all the likelihood evaluations on the right side 
of this expression involve a maximum of 2Ai — 1 dimen- 
sions, as opposed to £max — ^min + 1 for the full joint BR 
estimator, effectively lifting the curse of dimensionality. 

4.3. Accuracy and convergence 
4.3.1. Methodology 

Before the block factorized BR estimator can be used 
for real analysis, it is necessary to assess its accuracy 
and convergence properties. To this aim, we analyze two 
different simulations with the above machinery, adopt- 
ing the convergence analysis methodology of Chu et al. 
(2005), but implementing a few minor changes to im- 
prove the reliability of the convergence statistics. Monte 
Carlo samples are produced with Commander (Eriksen 
et al. 2004, 2007). 

The first simulation consists of a full-sky high- 
resolution (A^sido — 512, ^niax — 1024, 14' Gaussiau 
beam) data set with uniform noise (65 ^K RMS per 
pixel). The main advantage of this case is that the Cg 
likelihood (Equation 5) factorizes in £, and can be eval- 
uated analytically. 



Adeal(C^) « JJ 



2 (Cf+JVf) 



(Ce + Ne)^ 



(12) 



where a^ is the angular power spectrum of the noisy sky 
map, and Ni is the ensemble averaged noise power spec- 
trum. The second simulation consists of a low-resolution 
(^sidc = 32, Cax = 95, 4° FWHM Gaussian beam) 
data set with the WMAP KQ85 sky cut imposed, re- 
moving 25 % of the sky. White noise of 5 /iK RMS is 
added to each pixel, resulting in a signal-to-noise of unity 
at £ w 70. The main purpose of this simulation is to 
study the effect of correlations between different multi- 
poles arising from the sky cut through comparison with 
brute-force pixel-space likelihood evaluation. However, 
because of the brute-force evaluations, this case is neces- 
sarily limited to low angular resolution. 



= 300 




Fig. 5. — Comparison of four different methods of evaluating a simple amplitude-tilt likelihood for a full-sky simulation: The analytic 
case, the full Blackwell-Rao case, and two versions of the hybrid likelihood described in this paper - with A£ = 1 and 5, respectively. 



The CMB signal is drawn from a Gaussian distribution 
with a covariance given by the best-fit WMAP ACDM 
power spectrum, C^°^ (Hinshaw et al. 20f2). In each 
case, we fit a two-parameter amphtude-tih {A-n) model 
on the form 



Ct{A,n)=A 



4 



CI 



(13) 



where Iq ~ ^max/2, simply by mapping out L(A, n) over 
a two-dimensional grid. For £min — 2, this choice of pivot 
multipole ensures a low degree of correlation between A 
and n. 

To assess both convergence and accuracy, we adopt the 
following measure of difference between two likelihoods, 
£i and £2 (Chuet al. 2005), 



\Li{A,n) - L2{A,n)\ dAdn. 



(14) 



One can show that if Ci and £2 are two bivariate Gaus- 
sian distributions with the same covariance matrix, S, 
but different means, fli and fl2, then 



9 = * (^v/(Aii - A'2)S-n/^i - M2) 



(15) 



where $ is the cumulative standard normal distribution 
function. From this, one finds that a O.lcr shift in a 
Gaussian distribution corresponds to q ^ 0.05. In the 
following, we therefore define two distributions to agree 
if g< 0.05. 

For the accuracy assessment, we simply compare the 
block factorized BR likelihood with the exact case. Con- 
vergence assessment, however, is done by drawing two 
disjoint sample subsets from the full set of available 
Monte Carlo samples, compute the BR estimator from 



each subset, and compare the resulting likelihoods. We 
then increase the number of samples in the two subsets, 
^samp, until q is consistently lower than 0.05 even when 
adding 100 additional samples; the latter criterion is im- 
posed in order to avoid chance agreement. Finally, we 
repeat this calculation a certain number of times with 
different sample subsets (but drawn from the same full 
sample set), and report the median of the resulting values 
of Nsurnp as thc final estimate of the number of samples 
required for convergence. 

4.3.2. Results 

Figure 5 shows C(A, n) evaluated from the high- 
resolution full-sky simulation for nine different values of 
^max with four different likelihood expressions; analytic, 
standard BR, and two variations of the block-factorized 
BR estimator. A total of A'samp = 28, 000 samples are 
included in the two latter, a choice that is set to highlight 
the fundamental difference between the various cases. In 
particular, since there are no correlations between any 
multipoles in this case, all four approaches are in princi- 
ple exact, and the only difference among the four cases 
are their relative convergence rates. 

For £max < 300, we see that all four estimator agree to 
very high accuracy. However, from £max > 400 the full- 
range BR likelihood starts to diverge. At fmax = 900, 
it is separated from the analytic result by more than 
15ct. In this case, the sum in Equation 10 is strongly 
dominated by the one sample that happens to have the 
lowest power spectrum scatter about some best-fit mode, 
and the resulting distribution is simply an imprint of the 
cosmic variance kernel (Equation 8) for that sample. 

The block factorized BR estimators remain valid to 
higher ^max, demonstrating how the "curse of dimension- 
ality" is lifted by breaking the full parameter space into 
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Fig. 6. — Comparison of five different methods of evaluating 
a simple amplitude-tilt likelihood for a cut-sky simulation: The 
pixel-based case, the full Blackwell-Rao case, and three versions of 
the hybrid likelihood described in this paper - with Af = 5, 10, 
and 15, respectively. 



smaller regions that are easier to handle. In particular, 
the case with A£ = 1 agrees with the analytic case even 
at £max = 900 to ~0.3cr. 

In Figure 6 we show similar results for the low- 
resolution simulation for which 25 % of the sky is removed 
by masking, but this time comparing with the brute- 
force pixel-based likelihood estimator, and this time us- 
ing -/Vgamp = 60, 000 samples. Again, we see that all cases 
agree to better than O.lcr, even for the factorized BR esti- 
mator with A£ — 5, demonstrating the accuracy of both 
the full and the factorized BR estimators, even with very 
small block sizes and for the fairly large WMAP mask. 

Next, in the top panel of Figure 7 we plot the num- 
ber of samples required for convergence according to the 
above criterion for the high-resolution full-sky simula- 
tion described above, and in the bottom panel we show 
the same, but after applying the WMAP mask, in order 
to introduce a realistic multipole correlation structure. 
The upper vertical limit in these plots is set by the finite 
number of samples included in the analysis. 

In all cases we see the same qualitative behaviour: Re- 
ducing the dimensionality of the BR estimator through 
block factorization greatly improves the convergence rate 
by reducing the required number of samples by orders of 
magnitude at high ^'s. For instance, for the full-sky case 
and with a block size of Ai = 6, only 10'^ samples are 
required in order to reach convergence up to ^max — 500, 
whereas the full BR estimator would require 10^. For the 
25 % WMAP mask, about lO"*^ samples are required for 
^max — 200, while it is difficult to establish any sensible 
estimate for the full BR estimator in this case. (Note 
that the high-^ projection for the latter case, marked by 
a dashed line, is based on linear extrapolation from a 
few low-^ points, since convergence was not reached at 
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Fig. 7. — Convergence analysis for the split Blackwell-Rao es- 
timator, with convergence defined in sec. 4.3. The samples come 
from running Commander on a full-sky simulation. We show the 
median of the number of samples needed for convergence for a given 
'^max, along with the best-fit regression line in logj^Q-space. The me- 
dian is computed from 10 (top) and 1024 (bottom) runs where the 
samples are scrambled between each run. The regression lines are 
dotted when they extend past the available data points. The high 
number of runs per data point for the bottom plot is also the rea- 
son for the more sparse sampling - each data point represented a 
very high computational cost, and so the number of data points 
were reduced. 

all within the current sample set at higher multipoles. 
This projection is therefore associated with a very large 
systematic uncertainty.) 

5. CONCLUSIONS 

The main result presented in this paper is a statis- 
tically well motivated block factorization of the CMB 
power spectrum likelihood. Because the spherical har- 
monics are nearly orthogonal over the large sky coverages 
achieved by current CMB satellite experiments such as 
Planck and WMAP, any correlations between different 
CiS are localized in multipole space. Under the assump- 
tion that these probabilistic dependencies have a strictly 
finite range, the full CMB likelihood may be reduced into 
a product of lower-dimensional marginals. 

We have applied this result to two outstanding prob- 
lems in CMB analysis. First, we use this expression to 



derive a well-motivated hybrid CMB likelihood estima- 
tor, merging an exact low-i? component with an approx- 
imate high-^ component, that accounts for correlations 
between the two regions. Although a detailed analysis of 
the WMAP likelihood shows that these correlations are 
negligible for the WMAP sky cut and the six-parameter 
ACDM model, we nevertheless recommend this new esti- 
mator for future experiments and analyses, both because 
its implementation is trivial, and because it provides ad- 
ditional safety when analyzing non-standard models. 

Second, we have shown how the same expression 
may be used to accelerate the convergence rate of the 
Blackwell-Rao CMB likelihood estimator by orders of 
magnitude at high ^s. This is achieved by factorizing the 
full parameter space into subspaces that each individu- 
ally converge faster, and then merging these sub-blocks 
into a full-range estimator at the likelihood level using 
the block factorization formula. 

It should be noted that these results rely directly on the 
assumption of vanishing long-range correlations. While 
this assumption holds to a very high accuracy for the 
basic CMB signal plus noise data model, it is in general 
not valid when including systematic effects in the analy- 
sis. Perhaps the two most important examples are corre- 



lated beam uncertainties and unresolved extra-Galactic 
point sources, each of which extend through all £'s (e.g., 
Planck XV 2013). Fortunately, these long-range degrees 
of freedom may be modelled in terms of a small num- 
ber of power spectrum templates, each with an unknown 
amplitude. One can therefore marginalize over these by 
sampling the unknown amplitudes as nuiscance param- 
eters, similar to what was done for high-£ astrophysical 
parameters in the 2013 Planck likelihood (Planck XV 
2013). 

Finally, we note that the block factorization presented 
in Section 2 is a completely general statistical result that 
holds exactly for any banded probability distribution, 
and we therefore expect it to also find applications out- 
side the CMB field. 
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